String Resemblance Systems: A Unifying Framework for String Similarity with Applications to Literature and Music

نویسنده

  • Masayuki Takeda
چکیده

Identification of similar objects from a large collection of objects is one fundamental technique in several different areas in computer science, e.g., the casebased reasoning and the machine discovery. Strings are the most basic representations of objects inside computers, and thus string similarity is one of the most important topics in computer science. Similarity measure must be sensitive to the kind of differences we wish to quantify. The weighted edit distance is one such framework in which the measure can be varied by altering weight assignment to each edit operation depending on symbols involved. However, it does not suffice to solve ‘real problems’ (see e.g., [2]). It is considered that two objects have necessarily a common structure if they seem similar, and the degree of similarity depends upon how valuable the common structure is. Based on this intuition, we present a unifying framework, named string resemblance system (SRS, for short). In this framework, similarity of two strings can be viewed as the maximum score of pattern that matches both of them. The differences among the measures are therefore the choices of (1) pattern set to which common patterns belong, and (2) pattern score function which assigns a score to each pattern. For example, if we choose the set of patterns with variable length don’t cares and define the score of a pattern to be the number of symbols in it, then the obtained measure is the length of the longest common subsequence (LCS) of two strings. In fact, the strings acdeba and abdac have a common pattern a?d?a? which contains three symbols. With this framework one can easily design and modify his/her measures. In this paper we briefly describe SRSs and then report successful results of applications to literature and music.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A new virtual leader-following consensus protocol to internal and string stability analysis of longitudinal platoon of vehicles with generic network topology under communication and parasitic delays

In this paper, a new virtual leader following consensus protocol is introduced to perform the internal and string stability analysis of longitudinal platoon of vehicles under generic network topology. In all previous studies on multi-agent systems with generic network topology, the control parameters are strictly dependent on eigenvalues of network matrices (adjacency or Laplacian). Since some ...

متن کامل

Asymptotic Approximations of the Solution for a Traveling String under Boundary Damping

Transversal vibrations of an axially moving string under boundary damping are investigated. Mathematically, it represents a homogenous linear partial differential equation subject to nonhomogeneous boundary conditions. The string is moving with a relatively (low) constant speed, which is considered to be positive.  The string is kept fixed at the first end, while the other end is tied with the ...

متن کامل

Generalized Similarity Kernels for Efficient Sequence Classification

String kernel-based machine learning methods have yielded great success in practical tasks of structured/sequential data analysis. In this paper we propose a novel computational framework that uses general similarity metrics and distance-preserving embeddings with string kernels to improve sequence classification. An embedding step, a distance-preserving bitstring mapping, is used to effectivel...

متن کامل

Scalable Algorithms for String Kernels with Inexact Matching

We present a new family of linear time algorithms for string comparison with mismatches under the string kernels framework. Based on sufficient statistics, our algorithms improve theoretical complexity bounds of existing approaches while scaling well in sequence alphabet size, the number of allowed mismatches and the size of the dataset. In particular, on large alphabets and under loose mismatc...

متن کامل

اثرات کوانتومی خلأ برای یک ریسمان بوزونی جرم‌دار در حضور میدان پس‌زمینه

We study the Casimir effect for a Bosonic string extended between D-branes, and living in a flat space with an antisymmetric background B-field. We find the Casimir energy as a function of the B-field, and the mass-parameter of the string, and accordingly we obtain a B-dependence correction term to the ground-state mass of the string. We show that for sufficiently large B-field, the ground stat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001